Information-Theoretic Models of Tagging

نویسنده

  • Harry Halpin
چکیده

In earlier work, we showed using Kulback-Leibler (KL) divergence that tags form a power law distribution very quickly. Yet there is one major observed deviation from the ideal power law distribution for the top 25 tags, a large “bump” in increased frequency for the top 7-10 tags. We originally hypothesized that the “bump” in the data could be caused by a preferential attachment mechanism. However, an experiment that tested both feedback and no-feedback conditions over tagging (200+ subjects) shows that the power law distribution arises regardless of any feedback effect. We hypothesize that an information-theoretic analysis of tags lead to a power law without feedback. In Halpin, Robu, and Shepard, we showed using Kulback-Leibler (KL) divergence that tags form a power law distribution very quickly. This can be demonstrated by taking the KL divergence between every two consecutive points in time of the distribution, so stabilization occurs when the KL divergence goes to zero. An alternative method is to take the KL divergence with regards to an “ideal” power law, checking with each iteration if the KL divergence decreases. We demonstrated both these measures converge quickly using 500 randomly selected tags from del.icio.us both from “popular” (heavily tagged) and “recent” (randomly chosen) tags by inspecting their tagging histories [2]. Yet there is one major observed deviation from the ideal power law distribution for the top 25 tags, a large “bump” in increased frequency for the top 7-10 tags. We originally hypothesized that the “bump” in the data could be caused by a preferential attachment mechanism. However, in a recent experiment that tested both feedback and no-feedback conditions over tagging (200+ subjects) shows that the power law distribution arises regardless of any feedback effect [1]. There is some increased variance in tags without feedback and reinforced tags move up the power law distribution quicker with feedback. Yet, even without feedback the fundamental power law distribution arises. Can an information-theoretic analysis of tags lead to a power law without feedback? In a classical information retrieval paradigm, each group of tags would have an entropy assigned to it depending on what URIs they retrieve. For example, a tag applied to every relevant resource would retrieve every document, and so have an entropy of 0 while a tag that selects a single relevant resource would have an entropy of 1. Obviously, tags should not retrieve a single resource, but some small constant number, such as seven. If users selected an non-ideal-yet-approximate and useful encoding with every choice, a power law could result. Dagstuhl Seminar Proceedings 08391 Social Web Communities http://drops.dagstuhl.de/opus/volltexte/2008/1787

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Combination of real options and game-theoretic approach in investment analysis

Investments in technology create a large amount of capital investments by major companies. Assessing such investment projects is identified as critical to the efficient assignment of resources. Viewing investment projects as real options, this paper expands a method for assessing technology investment decisions in the linkage existence of uncertainty and competition. It combines the game-theore...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

A game Theoretic Approach to Pricing, Advertising and Collection Decisions adjustment in a closed-loop supply chain

This paper considers advertising, collection and pricing decisions simultaneously for a closed-loop supplychain(CLSC) with one manufacturer(he) and two retailers(she). A multiplicatively separable new demand function is proposed which influenced by pricing and advertising. In this paper, three well-known scenarios in the game theory including the Nash, Stackelberg and Cooperative games are expl...

متن کامل

Modeling gene regulatory networks: Classical models, optimal perturbation for identification of network

Deep understanding of molecular biology has allowed emergence of new technologies like DNA decryption.  On the other hand, advancements of molecular biology have made manipulation of genetic systems simpler than ever; this promises extraordinary progress in biological, medical and biotechnological applications.  This is not an unrealistic goal since genes which are regulated by gene regulatory ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008